What is Claimed is: 

[cl] A method of comparing a query dataset N with a subject dataset M, comprising: 

dividing said query dataset N into n N data elements having a size within a 
specified range; 

dividing said subject dataset M into n M data elements having a size within said 

specified range; 

determining a number of tasks for an entire comparison of datasets N and M as 

sending all data elements and task definitions to a master CPU of a master-slave 
distributed computing platform, 

wherein task definitions comprise at least one comparison parameter, at 
least one executable element capable of performing comparisons, a query 
LU data element ID/descriptor, and a subject data element ID/descriptor, and 

a p wherein data elements are sent alternately from query and subject data 

elements; 

O sending a task definition for each task from the master CPU to one of a plurality 

M ' of slave CPUs when all parts of a task definition and data elements referenced by 

said task definition are available at said master CPU; 

sending data elements referenced by said task definition to said slave CPU; 

performing each task on a slave CPU; and 

returning task results for each task to said master CPU. 

[c2] The method of claim [cl], further comprising randomizing sequence order of each 
dataset if either dataset contains related sequences in a contiguous arrangement. 
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[c3] The method of claim [cl], further comprising formatting said datasets so as to use 
exactly the same ambiguity substitutions. 

[c4] The method of claim [cl] wherein dividing said datasets into data elements 
further comprises: 

stripping all metadata from data; 

packing said data into an efficient structure; 

creating an index for said data and packing said index and said data in an 
uncompressed data structure; and 

compressing said uncompressed data structure into a data element using a 
redundancy reduction data compression method. 



1 [c5] The method of claim [cl], further comprising sending remaining data elements 

S from a more numerous of said datasets to said master CPU followed by all task 

definitions for otherwise complete tasks if there are fewer data elements from one 



ry 

p f* dataset. 



[c6] The method of claim [c 1] wherein performing a task on said slave CPU further 
comprises: 

ji* uncompressing and unpacking data from said query and subject data elements; 

looping through query sequences from said query data element to perform setup, 
preprocessing and table generation for each row of comparisons; 

looping through subject sequences from said subject data element and, for each 
pair of query and subject sequences, performing a comparison using said 
executable element and finding results based on said at least one comparison 
parameter; and 

storing minimal information that will allow reconstruction of said result. 
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[c7] The method of claim [c6] wherein storing said minimal information comprises: 

storing index information for said query and said subject sequence; 

storing bounds information for start and stop of said query and subject sub 
sequences; 

storing data that quantify fulfillment of significance criteria for a significant 
match; and 

storing an efficiently encoded representation of alignment between said bounds 
corresponding to a high-scoring segment pair. 

[c8] The method of claim [c7], further comprising storing a seed point and sum-set 
membership for each alignment for BLAST. 

[c9] The method of claim [c7], further comprising storing task results in a task result 
file, said file including query and subject sequence data and metadata 
corresponding to the task that the results came from, metadata for the subject 
sequence, the partial subject sequence data corresponding to the subject bounds of 
the significant alignment result, and any other results data for each result in the 
task results. 

[clO] The method of claim [c9], further comprising generating a BLAST report for each 
query data element. 

[c 1 1 ] The method of claim [c 1 0] , further comprising concatenating results from all 

BLAST reports to produce a text file identical to a blastall run of said query and 
subject datasets. 

[cl2] The method of claim [cl] wherein said datasets are selected from the group 
consisting of genomic and proteomic databases. 

[cl3] A system for comparing a query dataset N with a subject dataset M, comprising: 

a master CPU of a master-slave distributed computing platform; 
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a plurality of slave CPUs capable of communication with said master CPU; and 
a client CPU with instructions for: 

dividing said query dataset N into n N data elements having a size within a 
specified range; 

dividing said subject dataset M into n M data elements having a size within 
said specified range; 

determining a number of tasks for an entire comparison of datasets N and 
Mas n Nx n M ; 

sending all data elements and task definitions to said master CPU of a 
master-slave distributed computing platform, 

S wherein task definitions comprise at least one comparison parameter, at 

U least one executable element capable of performing comparisons, a query 

rf J 

! H data element ID/descriptor, and a subject data element ID/descriptor, and 
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wherein data elements are sent alternately from query and subject data 



yl elements; 



said master CPU comprising instructions for: 

sending a task definition for each task to one of said plurality of slave 
CPUs when all parts of a task definition and data elements referenced by said task 
definition are available at said master CPU; and 

sending data elements referenced by said task definition to said slave 
CPU; and 

said slave CPUs including instructions for: 
performing each task; and 

returning task results for each task to said master CPU. 
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[cl4] The system of claim [cl3], further comprising means for randomizing sequence 
order of each dataset if either dataset contains related sequences in a contiguous 
arrangement. 

[cl5] The system of claim [cl3], further comprising means for formatting said datasets 
so as to use exactly the same ambiguity substitutions. 

[cl6] The system of claim [cl3], wherein said instructions for dividing said datasets 
into data elements further comprises instructions for: 

stripping all metadata from data; 

packing said data into an efficient structure; 

Q 

yn creating an index for said data and packing said index and said data in an 
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uncompressed data structure; and 

compressing said uncompressed data structure into a data element using a 
redundancy reduction data compression method. 



^ [cl7] The system of claim [cl3], further comprising instructions for sending remaining 

data elements from a more numerous of said datasets to said master CPU 
p followed by all task definitions for otherwise complete tasks if there are fewer 

data elements from one dataset. 

[cl8] The system of claim [cl3], wherein instructions for performing a task on said 
slave CPU further comprises instructions for: 

uncompressing and unpacking data from said query and subject data elements; 

looping through query sequences from said query data element to perform setup, 
preprocessing and table generation for each row of comparisons; 

looping through subject sequences from said subject data element and, for each 
pair of query and subject sequences, performing a comparison using said 
executable element and finding results based on said at least one comparison 
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parameter; and 

storing minimal information that wili allow reconstruction of said resu... 
[c ,9, The sysfcm of claim [c.8], wherein said instructions for storing said minimal 
information comprises instructions for: 

storing index information for said query and said subject sequence; 
storing bounds information for start and stop of said query and subject sub 

sequences; 

storing dam that quantify fulfillment of significance criteria for a significant 

match; and 

storing an efficiently encoded representation of alignment between said bounds 
W corresponding to a high-scoring segment pair. 

[c20] v.^MM^<»^ i ~^*^~r*' 

ina^kresultfilcsaidfileincludingqueryandsubjectsequenccdautand 
metadata corresponding to the task tha, the results came from, metadata for the 
subject sequence, the partial subject sequence data corresponding to the subject 
bounds of ure significant alignment result, and any other results data for each 
result in the task results. 
[C 21] The system of claim [c20], further comprising instructions for generating a 

BLAST report for each query data element. 
m The system of claim [c21], further comprising means for concatenating results 
from all BLAST reports to produce a text file identica. to a blastall run of satd 
query and subject datasets. 

consisting of genomic and proteomic databases. 
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